Snowpark Custom Transformation Job

Calibo Accelerate supports custom transformation using Snowpark with Snowflake as source and target data lake.

This support allows you to perform data transformation within the Snowflake ecosystem, reducing data movement. Since compute, transformation, and storage are managed within Snowflake it simplifies the architecture as well as governance. Snowpark supports writing custom transformation scripts using Python, Scala, or Java languages giving you flexibility to select the language of your choice.

To create a Snowpark custom transformation job

  1. Sign in to the Calibo Accelerate platform and navigate to Products.

  2. Select a product and feature. Click the Develop stage of the feature and navigate to Data Pipeline Studio.

  3. Create a pipeline with the following stages:

    Data Lake > Data Transformation > Data Lake

    Note: The stages and technologies used in this pipeline are merely for the sake of example.

  4. In the data lake nodes add Snowflake and configure the data lake nodes.

  5. In the data transformation stage, click the arrow for Snowpark and then click Add to add one of the following options to the stage depending on your use case:

    • Snowpark Java - Gradle

    • Snowpark Java - Maven

    • Snowpark Python

    Select the language for custom transformation

  6. To add the selected technology you can do one of the following:

    Use existing repository for selected technology

    • Use Existing Repository - Turn on the toggle to use an existing repository configured for the selected technology and do the following:

      • Provide the technology title.

      • Select a repository from the dropdown.

      Create new repository for selected technology

    • Create New Repository - Turn off the Use Existing Repository toggle and provide the following information:

      • Technology Title - A name for the technology for which you are creating the repository.

      • Organization - The organization that owns the repository.

      • Repository Name - A name for the repository that is created for the technology.

      • Visibility - Specifies who can view and access the repository.

  7. Connect the data transformation node to the source and target nodes.

  8. To replace the placeholder code with your own custom code. do the following:

    1. Click Source Code and open the main.py file.

    2. Click Edit.

    3. Either replace the complete code or replace the placeholders Table 1, Table 2, Column 1 and Column 2 with actual values.

    4. Click Commit Changes.

    5. Open the test_main.py file and repeat steps 3 and 4.

  9. Click the data transformation node and select a branch on which you want the pipeline to be built and then click Trigger Build. Click Refresh for the latest status on the build.
  10. Once the Jenkins build is complete, click Create Custom Job.
  11. Complete the following steps to create the Snowpark custom transformation job:

  12. Click Publish to publish the pipeline with the changes.

  13. Click Run Pipeline to run the job or pipeline.

  14. In case the job fails, click Source Code. Based on the error, edit the code and commit the changes. When you click the Snowpark node, a message is displayed:

    "A new code commit has been detected after the last build. Rebuild the pipeline to reflect the new changes." Click Rebuild.

Related Topics Link IconRecommended Topics What's next? Snowflake Custom Transformation Job